Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads
نویسندگان
چکیده
The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.
منابع مشابه
Completing bacterial genome assemblies with multiplex MinION sequencing
Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome r...
متن کاملhybridSPAdes: an algorithm for hybrid assembly of short and long reads
MOTIVATION Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles l...
متن کاملShort read fragment assembly of bacterial genomes.
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short rea...
متن کاملCompleting bacterial genome assemblies: strategy and performance comparisons
Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, such assemblies are often unfinished, fra...
متن کاملCrystallizing short-read assemblies around lone Sanger reads
New short-read sequencing technologies produce large volumes of 25-30 base paired-end reads. In this paper, we present a sequencing protocol and de novo assembler program (SHORTY) targeted towards such microread data. Our protocol augments short-paired reads using a trivially small number of Sanger reads (only one to three reads per bacterial genome). Still, these “seed reads” enable us to prod...
متن کامل